The introduction section should relay what you are attempting to accomplish. It would include a statement of the business, science, research, or personal interest you have that leads to analyzing the data you’ve chosen. It should provide enough background to your work such that a reader would not need to load your data to understand your report. Like the simulation project, you can assume the reader is familiar with the course concepts, but not your data. Some things to consider:
What is this data? Where did it come from? What are the variables? Why is it interesting to you? Why are you creating a model for this data? What is the goal of this model?
The methods section should contain the bulk of your “work.” This section will contain the bulk of the R code that is used to generate the results. Your R code is not expected to be perfect idiomatic R, but it is expected to be understood by a reader without too much effort. Use RMarkdown and code comments to your advantage to explain your code if needed.
This section should contain any information about data preparation that is performed to the original data before modelling. Then you will apply methods seen in class, which may include some of the following but are not limited to:
Multiple linear regression Dummy variables Interaction Residual diagnostics Outlier diagnostics Transformations Polynomial regression Model selection Your task is not to use as many methods as possible. Your task is to use appropriate methods to find a good model that can correctly answer a question about the dataset, and then to communicate your result effectively. Some possible items to be discussed:
Description of the original data file including description of all relevant variables. Description of additional data preparation that you performed. Description of the process you chose to follow. Narrative of your step-by-step decision making process throughout the analysis as you adjusted the model and attempted to validate model assumptions.
# install necessary packages
# install.packages("readxl")
# install.packages("sqldf")
# install.packages("corrplot")
library(readr)
library("readxl")
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Could not load tcltk. Will use slower R code instead.
## Loading required package: RSQLite
library(glue)
library("corrplot")
## corrplot 0.84 loaded
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
library(leaps)
library(MASS)
options(digits=6)
options(warn=-1)
# price of bitcoin
Bitcoin <- read_csv("Bitcoin Historical Data - Investing.com.csv")
## Parsed with column specification:
## cols(
## Date = col_character(),
## Price = col_double(),
## Open = col_double(),
## High = col_double(),
## Low = col_double(),
## Vol = col_character(),
## Change = col_character(),
## left = col_double(),
## right = col_character(),
## Volume = col_double()
## )
Bitcoin$Date = as.Date(Bitcoin$Date, format="%b %d, %Y")
Bitcoin$month <- strftime(Bitcoin$Date, "%m")
Bitcoin$year <- strftime(Bitcoin$Date, "%Y")
plot(Price ~ Date, data=Bitcoin, type = "l")
plot(Volume ~ Date, data=Bitcoin, type = "l")
# Incumbent president
President <- read_csv("President.csv")
## Parsed with column specification:
## cols(
## Date = col_character(),
## president_name = col_character(),
## president_code = col_double()
## )
President$Date = as.Date(President$Date, format="%b %d, %Y")
plot(president_code ~ Date, data=President)
The president code is a categorical variable that classifies the United State’s incumbent president across 2010 and 2020 into “0” (Barack Obama) and “1” (Donald Trump). We are interested in examining the impacts of these two presidents’ respective trade and monetary policy on the financial market. Particularly, Trump’s infamous “covfefe” tweets have been pushing markets around. We would like to investigate how this affects the traders of Bitcoin as well.
# Interest rate
The_interest_rate_for_the_United_States<- read_csv("The interest rate for the United States, from 2010 - 2020.csv")
## Parsed with column specification:
## cols(
## Date = col_date(format = ""),
## interest_rate = col_double()
## )
The_interest_rate_for_the_United_States$month <- strftime(The_interest_rate_for_the_United_States$Date, "%m")
The_interest_rate_for_the_United_States$year <- strftime(The_interest_rate_for_the_United_States$Date, "%Y")
plot(interest_rate ~ Date, data=The_interest_rate_for_the_United_States, type = "o")
Interest rate is considered as a crucial factor influencing investors’ willingness to invest. The longer interest rates stay low, the more investors tend to deploy leverage. This March, the price of Bitcoin rallied to $5,940 as the United States Federal Reserve cut interest rates to 0.
# Exchange rates
US_dollar_yuan_exchange_rate <- read_csv("US dollar-yuan exchange rate.csv")
## Parsed with column specification:
## cols(
## Date = col_character(),
## exchange_rate = col_double()
## )
US_dollar_yuan_exchange_rate$Date = as.Date(US_dollar_yuan_exchange_rate$Date, format="%m/%d/%y")
plot(exchange_rate ~ Date, data = US_dollar_yuan_exchange_rate, type = 'l')
Euro_to_US_exchange_rate <- read_csv("euro-dollar-exchange-rate-historical-chart.csv")
## Parsed with column specification:
## cols(
## Date = col_date(format = ""),
## exchange_rate = col_double(),
## X3 = col_double()
## )
Euro_to_US_exchange_rate$Date = as.Date(Euro_to_US_exchange_rate$Date, format="%m/%d/%y")
Euro_to_US_exchange_rate$exchange_rate = 1/Euro_to_US_exchange_rate$exchange_rate
plot(exchange_rate ~ Date, data = Euro_to_US_exchange_rate, type = 'l')
Exchange rates also play a relevant role in the financial market as they affect foregin investors’ sentiment and the transaction costs. For example, China has introduced several capital controls with regards to Bitcoin speculation. Given that more than 90% ofthe bitcoins are traded with the yuan, and more than 70% of the mining takes places in China, we believe the exchange rate of the US dollar with the yuan is an influential factor in determining Bitcoin prices.
# Gold price
Gold <- read_csv("Gold Futures Historical Data.csv")
## Parsed with column specification:
## cols(
## Date = col_character(),
## Price = col_double(),
## Open = col_double(),
## High = col_double(),
## Low = col_double(),
## Vol. = col_character(),
## `Change %` = col_character()
## )
Gold$Date = as.Date(Gold$Date, format="%b %d, %Y")
plot(Price ~ Date, data=Gold, type='l')
Gold has been a hedge and safe haven for investors. Since Bitcoins usually react positively to great financial movements, we would like to study bitcoin’s relationship with precious metals price in a dynamic environment and investigate whether bitcoins can be a hedge or safe haven asset under market uncertainty scenarios.
This dataset we will use the Volume and Price columns.
# Stock market indices
Dow_Jones_Industrial_Average <- read_csv("Dow Jones Industrial Average Historical Data.csv")
## Parsed with column specification:
## cols(
## Date = col_character(),
## Price = col_number(),
## Open = col_number(),
## High = col_number(),
## Low = col_number(),
## Vol. = col_character(),
## `Change %` = col_character()
## )
Dow_Jones_Industrial_Average$Date = as.Date(Dow_Jones_Industrial_Average$Date, format="%b %d, %Y")
plot(Price ~ Date, data=Dow_Jones_Industrial_Average, type = 'l')
NYSE_Composite_Index <- read_csv("NYSE Composite Historical Data.csv")
## Parsed with column specification:
## cols(
## Date = col_character(),
## Price = col_number(),
## Open = col_number(),
## High = col_number(),
## Low = col_number(),
## Vol. = col_character(),
## `Change %` = col_character()
## )
NYSE_Composite_Index$Date = as.Date(NYSE_Composite_Index$Date, format="%b %d, %Y")
plot(Price ~ Date, data=NYSE_Composite_Index, type = 'l')
Dow Jones Industrial Average and NYSE Composite Index are major U.S. stock indices. We want to investigate the relationship between Bitcoin and these stock market barometers and examine whether they behave differently under market turmoil.
This dataset we will use the Adj Close and volume columns.
# Oil prices
WTI_oil_price <- read_csv("Crude Oil WTI Futures Historical Data.csv")
## Parsed with column specification:
## cols(
## Date = col_character(),
## Price = col_double(),
## Open = col_double(),
## High = col_double(),
## Low = col_double(),
## Vol. = col_character(),
## `Change %` = col_character()
## )
WTI_oil_price$Date = as.Date(WTI_oil_price$Date, format="%b %d, %Y")
# colnames(WTI_oil_price)[2] <- "WTI_oil_price"
plot(Price ~ Date, data=WTI_oil_price, type = 'l')
Brent_Oil_price <- read_csv("Brent Oil Futures Historical Data.csv")
## Parsed with column specification:
## cols(
## Date = col_character(),
## Price = col_double(),
## Open = col_double(),
## High = col_double(),
## Low = col_double(),
## Vol. = col_character(),
## `Change %` = col_character()
## )
Brent_Oil_price$Date = as.Date(Brent_Oil_price$Date, format="%b %d, %Y")
# colnames(Brent_Oil_price)[2] <- "Brent_oil_price"
plot(Price ~ Date, data=Brent_Oil_price, type = 'l')
Oil price plays an important role in the economy due to increasing global demand for oil to support industrialization and urbanization. Higher oil prices tend to make production more expensive for businesses and therefore raise commodity prices. Oil price increases are also generally thought to increase inflation and reduce economic growth. We want to see how Bitcoin is related to this market influencer.
Because we are using gold price, stock price, oil price, etc to predict Bitcoin price, it’s reasonable to expect a few days delay for the Bitcoin price to reflect what happen in stock and futures markets. For this model, we set the reponse time for Bitcoin price to 15 days, which assumes that Bitcoin price will react about 15 days later than the gold price, stock price, oil price, etc changed.
sql = glue("
select Bitcoin.Date,
Bitcoin.Price as Bitcoin_price,
Bitcoin.Volume as Transaction_volume,
exchange_China.exchange_rate as USD_Yuan_exchange_rate,
exchange_Euro.exchange_rate as USD_Euro_Exchange_Rate,
DJIA.Price as DJIA_index,
NYSE.Price as NYSE_index,
Brent_Oil_price.Price as brent_oil_price,
WTI_oil_price.Price as WTI_oil_price,
Gold.Price as Gold_price,
President.president_code,
interest.interest_rate,
Bitcoin.month,
Bitcoin.year
from Bitcoin
left outer join US_dollar_yuan_exchange_rate as exchange_China
on exchange_China.Date = Bitcoin.Date - 15
left outer join Euro_to_US_exchange_rate as exchange_Euro
on exchange_Euro.Date = Bitcoin.Date - 15
left outer join Dow_Jones_Industrial_Average as DJIA
on DJIA.Date = Bitcoin.Date - 15
left outer join NYSE_Composite_Index as NYSE
on NYSE.Date = Bitcoin.Date - 15
left outer join Brent_Oil_price
on Brent_Oil_price.Date = Bitcoin.Date - 15
left outer join WTI_oil_price
on WTI_oil_price.Date = Bitcoin.Date - 15
left outer join President
on President.Date = Bitcoin.Date - 15
left outer join Gold
on Gold.Date = Bitcoin.Date - 15
left outer join The_interest_rate_for_the_United_States as interest
on interest.month = Bitcoin.month
and interest.year = Bitcoin.year
where Bitcoin.Date between {c (1.0, as.Date('2014-06-01'))[2]} and {c (1.0, as.Date('2020-04-30'))[2]}
order by Bitcoin.Date asc")
combined_daily = sqldf(sql)
combined_daily$Date_posix = c(1, as.Date(combined_daily$Date))[-c(1)]
combined_daily$president_code = as.factor(combined_daily$president_code)
combined_weekdays = na.omit(combined_daily)
monthly_volume_df = sqldf("
select month, year, sum(combined_daily.Transaction_volume) as Transaction_volume
from combined_daily
group by combined_daily.month, combined_daily.year")
combined_monthly = sqldf("
select Date,
Bitcoin_price,
monthly_volume_df.Transaction_volume,
USD_Yuan_exchange_rate,
USD_Euro_Exchange_Rate,
DJIA_index,
NYSE_index,
brent_oil_price,
WTI_oil_price,
Gold_price,
president_code,
interest_rate,
Date_posix,
combined_weekdays.year
from combined_weekdays
join monthly_volume_df
on monthly_volume_df.month = combined_weekdays.month
and monthly_volume_df.year = combined_weekdays.year
where combined_weekdays.Date in
(select max(combined_weekdays.Date)
from combined_weekdays
group by combined_weekdays.month, combined_weekdays.year)
")
combined_weekdays = combined_weekdays[!(names(combined_weekdays) %in% c("month"))]
# combined_monthly
# plot(combined_monthly$Date, combined_monthly$Bitcoin_price, type = "l", col = "blue")
Bitcoin did not become popularly traded until 2014, therefore we will use all data from 2014 to 2020 because Bitcoin does not really count as a financial product before 2014. Data before 2014 are ignored.
In order to test the effectiveness of our model, we randomly select half of the observations as our training dataset and the other half as our testing datasets.
data = combined_weekdays
trn_idx = sample(nrow(data), nrow(data) / 2)
monthly_train = combined_monthly[trn_idx, ]
monthly_test = combined_monthly[-trn_idx, ]
weekday_train = combined_weekdays[trn_idx, ]
weekday_test = combined_weekdays[-trn_idx, ]
train_data = weekday_train
# train_data = monthly_train
Helper functions
model_names = c()
model_params_num = c()
model_rss = c()
model_adjr2 = c()
model_rmse = c()
test_rmse = c()
store_result = function(model, model_name, log = FALSE){
model = model
rmses = calc_rmse(model, data, trn_idx, log)
model_names <<- c(model_names, model_name)
model_params_num <<- c(model_params_num, length(coefficients(model)))
model_rss <<- c(model_rss, deviance(model))
model_adjr2 <<- c(model_adjr2, summary(model)$adj.r)
model_rmse <<- c(model_rmse, rmses[1])
test_rmse <<- c(test_rmse, rmses[2])
}
calc_rmse = function(model, data, trn_idx, log = FALSE) {
data_train = data[trn_idx, ]
data_test = data[-trn_idx, ]
pred = predict(model, data_train)
if(log){
pred = exp(pred)
}
actual = data_train$Bitcoin_price
train_rmse = sqrt(mean((pred - actual)^2))
pred = predict(model, data_test)
if(log){
pred = exp(pred)
}
actual = data_test$Bitcoin_price
test_rmse = sqrt(mean((pred - actual)^2))
c(train_rmse, test_rmse)
}
plots_model = function(model){
par(mfrow = c(1, 2))
plot(fitted(model), resid(model), col = "grey", pch = 20,
xlab = "Fitted", ylab = "Residuals", main = "Fitted vs Residuals")
abline(h = 0, col = "darkorange", lwd = 2)
qqnorm(resid(model), col = "darkgrey")
qqline(resid(model), col = "dodgerblue", lwd = 2)
}
plot_prediction = function(model, data, trn_idx, log=FALSE) {
par(mfrow = c(1, 1))
data_test = data[-trn_idx, ]
pred = predict(model, data)
if(log){
pred = exp(pred)
}
plot(data$Date, data$Bitcoin_price, type = "l", col = "blue")
points(data$Date, pred, type = "l", col = "green")
title(main = deparse(substitute(model)))
legend("topleft",
legend=c("Actually", "Predicted"),
col=c("blue", "green"), lty=1:1, cex=0.8,
text.font=4, bg='antiquewhite')
}
Our potential predictors of Bitcoin price are
We’ll use the pairs() plot to determine which variables may benefit from a quadratic or logarithm relationship with the response.
pairs(data[!(names(data) %in% c("Date", "month","year"))], col = "dodgerblue")
### Transformation Next, we will use the boxcox() function to find the best transformation of the form considered by the Box-Cox method.
model = lm(Bitcoin_price ~ . ,data=combined_monthly)
boxcox(model,
lambda = seq(-0.5, 0.5, by = 0.005),
plotit = TRUE)
we see a very high log likelihood when λ is from about -0.3 to 0.1, which recommends a transformation of log transformation.
In additional to pair plot, we use a correlation matrix to find all possible interaction between each of our predictors.
# find correlation between all of these predictors
log_transform = combined_monthly
log_transform$Transaction_volume = log(log_transform$Transaction_volume)
M <- cor(x=log_transform[!(names(log_transform) %in% c("Date", "month","year", "president_code"))], use = "everything", method="pearson") # get correlations
# M
library('corrplot') #package corrplot
corrplot(M, method = "circle")
It seems that Transaction volume, DJIA_index, NYSE_index and president term, interest rate and Date time has a positive relationship with Bitcoin price. We are going to pay more attention to these predictors.
Meanwhile, NYSE Composite Index and Dow Jones Industrial Average are high correlated to each other. SO are the Brent oil futures and WTI oil futures. These intercorrelation might be harmful to our model.
The results section should contain numerical or graphical summaries of your results. You should report a final model you have chosen. There is not necessarily one, singular correct model, but certainly some methods and models are better than others in certain situations. You may use any methods we studied this semester to complete this task, and provide evidence that your final choice of model is a good one. Some possible items to be discussed:
Let’s first use a full_mlr_model as a starting point. It is not a good model, but we will try to improve this model using parameter transformation, variable selection and model building techniques
full_mlr_model = lm(Bitcoin_price ~ ., train_data[!(names(data) %in% c("Date", "month"))])
plots_model(full_mlr_model)
store_result(full_mlr_model, "full_mlr_model")
As shown by the above two graph, the variance is not constant, the linearly is also violated. And according to Q-Q plot, Normality of error is violated, too.
We need to find better models.
Therefore, we will re-select our predictors using regsubsets().
Because there are exetremely values and peaks in Bitcoin price, We will consider adding quadratic terms to better captures the Bitcoin_price trends.
Since the number of parameters is not too many, we will use a comprehensive search to find the most significant models by calling regsubsets.
library(leaps)
exhaustive_search = summary(regsubsets(Bitcoin_price ~ Transaction_volume + USD_Yuan_exchange_rate + USD_Euro_Exchange_Rate + DJIA_index + NYSE_index + brent_oil_price + WTI_oil_price + Gold_price + president_code + interest_rate + Date_posix + I(Transaction_volume^2) + I(Date_posix ^ 2) + I(USD_Yuan_exchange_rate ^ 2) + I(USD_Euro_Exchange_Rate ^ 2) + I(DJIA_index ^ 2) + I(NYSE_index ^ 2) + I(brent_oil_price ^ 2) + I(WTI_oil_price ^ 2) + I(Gold_price ^ 2) + I(interest_rate ^ 2), data = data[!(names(data) %in% c("Date", "month"))], nvmax=8))
# exhaustive_search$which
p = length(coef(exhaustive_search))
n = length(resid(exhaustive_search))
We will limit up to 9 of the most significant predictors in our model. If we look at the RSS and Adjusted R-squared, we will also find that RSS and Adjusted R-sqaured are getting better as we increase our number of predictors until nine.
exhaustive_search$rss
## [1] 5953716303 5406212252 4862921126 4720408716 4594155600 4393951274 4326940837
## [8] 4240538876
exhaustive_search$adjr2
## [1] 0.734974 0.759179 0.783231 0.789438 0.794928 0.803728 0.806588 0.810318
(best_r2_ind = which.max(exhaustive_search$adjr2))
## [1] 8
exhaustive_search$which[best_r2_ind, ]
## (Intercept) Transaction_volume
## TRUE TRUE
## USD_Yuan_exchange_rate USD_Euro_Exchange_Rate
## TRUE FALSE
## DJIA_index NYSE_index
## TRUE TRUE
## brent_oil_price WTI_oil_price
## FALSE FALSE
## Gold_price president_code1
## FALSE FALSE
## interest_rate Date_posix
## FALSE FALSE
## I(Transaction_volume^2) I(Date_posix^2)
## FALSE FALSE
## I(USD_Yuan_exchange_rate^2) I(USD_Euro_Exchange_Rate^2)
## FALSE TRUE
## I(DJIA_index^2) I(NYSE_index^2)
## TRUE TRUE
## I(brent_oil_price^2) I(WTI_oil_price^2)
## FALSE FALSE
## I(Gold_price^2) I(interest_rate^2)
## FALSE TRUE
Based on the results of comprehensive search, we have an exhaustive_search_model model.
exhaustive_search_model = lm(Bitcoin_price ~ Transaction_volume + DJIA_index + NYSE_index + Date_posix + I(Date_posix^2) + I(USD_Yuan_exchange_rate^2) + I(USD_Euro_Exchange_Rate^2) + I(DJIA_index^2) + I(NYSE_index^2), data = train_data)
plots_model(exhaustive_search_model)
store_result(exhaustive_search_model, "exhaustive_search_model")
Unfortunately, linearity and constant variance is still violated. Therefore, we will using logarithm as a variance stabilizing transformation
log_model = lm(log(Bitcoin_price) ~ . , data = train_data, log = TRUE)
store_result(exhaustive_search_model, "log_model")
log_full_interaction_model = lm(log(Bitcoin_price) ~ (. ^ 2) , data = train_data, log = TRUE)
plots_model(log_full_interaction_model)
store_result(exhaustive_search_model, "log_full_interaction_model")
After doing log transformation and adding interaction, The mean of the residuals is around not zero therefore linearity is perserved.
Equal Variance also seems to be preserved because the the variance looks constant. From the Q-Q plot, the distribution of looks much better than previous models. But the normality of error is still violated at extreme values.
plot_prediction(log_full_interaction_model, data, trn_idx, log = TRUE)
Since we just add all interaction terms into our model, it looks like we have a overfitting when around year 2017. Besides, we would like to reduce the number of predictors to simplify our model.
n = length(resid(log_full_interaction_model))
log_full_int_mod_back_bic = step(log_full_interaction_model, direction = "backward", k = log(n), trace = 0)
log_full_int_mod_back_aic = step(log_full_interaction_model, direction = "backward", step=0)
## Start: AIC=-3561.33
## log(Bitcoin_price) ~ ((Date + Transaction_volume + USD_Yuan_exchange_rate +
## USD_Euro_Exchange_Rate + DJIA_index + NYSE_index + brent_oil_price +
## WTI_oil_price + Gold_price + president_code + interest_rate +
## year + Date_posix)^2)
log_interaction_only_year = lm(log(Bitcoin_price) ~ (. * year) , data = train_data)
n = length(resid(log_interaction_only_year))
log_groupby_year_back_bic = step(log_interaction_only_year, direction = "backward", k = log(n), trace = 0)
store_result(log_groupby_year_back_bic, "log_groupby_year_back_bic", log = TRUE)
log_groupby_year_back_aic = step(log_interaction_only_year, direction = "backward", step=0)
## Start: AIC=-3208.45
## log(Bitcoin_price) ~ ((Date + Transaction_volume + USD_Yuan_exchange_rate +
## USD_Euro_Exchange_Rate + DJIA_index + NYSE_index + brent_oil_price +
## WTI_oil_price + Gold_price + president_code + interest_rate +
## year + Date_posix) * year)
store_result(log_groupby_year_back_aic, "log_groupby_year_back_aic", log = TRUE)
We will use Cook’s Distance to calculate influential points.
cooksdistance = cooks.distance(log_groupby_year_back_bic)
log_groupby_year_back_bic_fix = lm(log_groupby_year_back_bic,
data = data,
subset = cooksdistance < 4 / length(cooksdistance))
plot_prediction(log_groupby_year_back_bic_fix, data, trn_idx, log = TRUE)
store_result(log_groupby_year_back_bic_fix, "log_groupby_year_back_bic_fix", log = TRUE)
cooksdistance = cooks.distance(log_groupby_year_back_aic)
log_groupby_year_back_aic_fix = lm(log_groupby_year_back_aic,
data = data,
subset = cooksdistance < 4 / length(cooksdistance))
plot_prediction(log_groupby_year_back_aic_fix, data, trn_idx, log = TRUE)
store_result(log_groupby_year_back_aic_fix, "log_groupby_year_back_aic_fix", log = TRUE)
cooksdistance = cooks.distance(log_full_int_mod_back_bic)
log_int_mod_back_bic_fix = lm(log_full_int_mod_back_bic,
data = data,
subset = cooksdistance < 4 / length(cooksdistance))
plots_model(log_int_mod_back_bic_fix)
plot_prediction(log_int_mod_back_bic_fix, data, trn_idx, log = TRUE)
store_result(log_int_mod_back_bic_fix, "log_int_mod_back_bic_fix", log = TRUE)
cooksdistance = cooks.distance(log_full_int_mod_back_aic)
log_int_mod_back_aic_fix = lm(log_full_int_mod_back_aic,
data = data,
subset = cooksdistance < 4 / length(cooksdistance))
plots_model(log_int_mod_back_aic_fix)
plot_prediction(log_int_mod_back_aic_fix, data, trn_idx, log = TRUE)
store_result(log_int_mod_back_aic_fix, "log_int_mod_back_aic_fix", log = TRUE)
model_list <- data.frame("model names"=model_names, "num of parameters" = model_params_num, "model_adjr2" = model_adjr2, "model_rmse"=model_rmse, "test_rmse"=test_rmse)
knitr::kable(model_list, digits = getOption("digits"), row.names = NA,
col.names = colnames(model_list), align = "c", caption = NULL, label = NULL,
format.args = list(), escape = TRUE)
| model.names | num.of.parameters | model_adjr2 | model_rmse | test_rmse |
|---|---|---|---|---|
| full_mlr_model | 18 | 0.814725 | 1700.670 | 1759.643 |
| exhaustive_search_model | 10 | 0.798801 | 1782.245 | 1681.731 |
| log_model | 10 | 0.798801 | 1782.245 | 1681.731 |
| log_full_interaction_model | 10 | 0.798801 | 1782.245 | 1681.731 |
| log_groupby_year_back_bic | 51 | 0.994384 | 737.881 | 783.204 |
| log_groupby_year_back_aic | 91 | 0.994714 | 703.224 | 748.628 |
| log_groupby_year_back_bic_fix | 51 | 0.994288 | 740.657 | 746.214 |
| log_groupby_year_back_aic_fix | 91 | 0.994847 | 700.439 | 686.086 |
| log_int_mod_back_bic_fix | 75 | 0.996128 | 540.544 | 583.248 |
| log_int_mod_back_aic_fix | 157 | 0.996685 | 487.353 | 484.915 |
The discussion section should contain discussion of your results and should frame your results in the context of the data. How is your final model useful?
The appendix section should contain code and analysis that is used, but that would have otherwise cluttered the report or is not directly related to the choice of model. Do not simply dump code in here. Only utilize the appendix to supplement the primary focus of the report. The appendix should also conclude with the names of the group members.
Write in complete sentences and pay attention to grammar, spelling, readability and presentation. If you include a table or chart, make sure you say something about it. If you’re not discussing a result, then it doesn’t belong in your report.
Submit the following three items in a .zip file just as you do in homework assignments.
your selected data, a .Rmd program file, and the project report (.html file) # Reference